Bridging weak supervision and privacy aware learning via sufficient statistics

نویسندگان

  • Giorgio Patrini
  • Frank Nielsen
  • Richard Nock
چکیده

We present a first attempt in connecting two areas of statistical learning that have not shared much common ground: weakly supervised learning and privacy aware learning. In the former, we aim to learn models of labeled data, when full information of the labels is not available; the latter concerns the design of algorithms with privacy guarantees for the protection of the data, while trading off utility for learning. We focus on classification with linear separators. There exists a sufficient statistic that summarizes all information from the label variable, the mean operator. The fact is known for a broad set of loss functions. Learning algorithms have exploited this property and overcome the lack of label knowledge, learning with label proportions only. We extend the result with almost no structural assumptions on loss functions and regularizers, and show how the approach is potentially viable for any weakly supervised task. Further, we consider the label as the only sensitive variable to protect, while the rest of the data is of public domain. In this scenario, we propose a simple method based on the Laplacian mechanism that obfuscates the mean operator and feed it to a learning algorithm which (a) enjoys α-label differential privacy, (b) is characterized by a generalization bound under almost no structural assumptions and (c) can be integrated into a secure data-sharing protocol for learning. Remarkably, some known results are recovered with simplified proofs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation from Indirect Supervision with Linear Moments

In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation. In this paper, we bypass both obstacles for a class of what we call linear indirectly-supervised problems. Our approach is simple: we solve a linear system to es...

متن کامل

Weak Supervision for Semi-supervised Topic Modeling via Word Embeddings

Semi-supervised algorithms have been shown to improve the results of topic modeling when applied to unstructured text corpora. However, sufficient supervision is not always available. This paper proposes a new process, Weak+, suitable for use in semi-supervised topic modeling via matrix factorization, when limited supervision is available. This process uses word embeddings to provide additional...

متن کامل

Bandit Label Inference for Weakly Supervised Learning

The scarcity of data annotated at the desired level of granularity is a recurring issue in many applications. Significant amounts of effort have been devoted to developing weakly supervised methods tailored to each individual setting, which are often carefully designed to take advantage of the particular properties of weak supervision regimes, form of available data and prior knowledge of the t...

متن کامل

Nursing Students’ Perspectives on Actual and Ideal Support and Supervision in Clinical Learning Environments in Zanjan University of Medical Sciences in 2011

Introduction: Clinical learning environment has an important role in clinical learning of nursing students. Any differences between students’ perspectives in expected and actual environment may result in decreased clinical learning. Therefore, the present study aimed to compare nursing students’ perspectives on actual and ideal support and supervision in clinical setting. Methods: In this desc...

متن کامل

Privacy-Preserving Bayesian Network Learning From Heterogeneous Distributed Data

In this paper, we propose a post randomization technique to learn a Bayesian network (BN) from distributed heterogeneous data, in a privacy sensitive fashion. In this case, two or more parties own sensitive data but want to learn a Bayesian network from the combined data. We consider both structure and parameter learning for the BN. The only required information from the data set is a set of su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015